A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

نویسندگان

  • Garvesh Raskutti
  • Michael W. Mahoney
چکیده

We consider statistical aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data (X,Y ) ∈ Rn×p × R, where n and p are both large and n p, sketching algorithms use a “sketching matrix,” S ∈ Rr×n, where r n, e.g., a matrix representing the process of random sampling or random projection. Then, rather than solving the LS problem using the full data (X,Y ), sketching algorithms solve the LS problem using only the “sketched data” (SX,SY ) ∈ Rr×p × R. Prior work has typically adopted an algorithmic perspective, in that it has made no statistical assumptions on the input X and Y , and instead it has assumed that the data (X,Y ) are fixed and worst-case. In this paper, we adopt a statistical perspective, and we consider the mean-squared error performance of randomized sketching algorithms, when data (X,Y ) are generated according to a statistical linear model Y = Xβ + , where is a noise process. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical predicition efficiency (SPE) and the statistical residual efficiency (SRE) of the sketched LS estimator; and we use our framework to provide results for several types of random projection and random sampling sketching algorithms. Among other results, we show that the SRE can be bounded when p . r n but that the SPE typically requires the sample size r to be substantially larger. Our theoretical results reveal that, depending on the specifics of the situation, leverage-based sampling methods can perform as well as or better than projection methods. Our empirical results reveal that when r is only slightly greater than p and much less than n, projection-based methods out-perform sampling-based methods, but as r grows, sampling methods start to out-perform projection methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares

We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an algorithmic perspective, when using sketching matrices constructed from random projections and leverage-score sampling, if the number of samples r much smaller than the original sample size n, then the worst-case (WC) error is...

متن کامل

Fast and Guaranteed Tensor Decomposition via Sketching

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tenso...

متن کامل

Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares

We study randomized sketching methods for approximately solving least-squares problem with a general convex constraint. The quality of a least-squares approximation can be assessed in different ways: either in terms of the value of the quadratic objective function (cost approximation), or in terms of some distance measure between the approximate minimizer and the true minimizer (solution approx...

متن کامل

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

We address the statistical and optimization impacts of using classical sketch versus Hessian sketch to solve approximately the Matrix Ridge Regression (MRR) problem. Prior research has considered the effects of classical sketch on least squares regression (LSR), a strictly simpler problem. We establish that classical sketch has a similar effect upon the optimization properties of MRR as it does...

متن کامل

Near Optimal Sketching of Low-Rank Tensor Regression

We study the least squares regression problem min Θ∈S D,R ‖AΘ− b‖2, where S D,R is the set of Θ for which Θ = ∑R r=1θ (r) 1 ◦ · · · ◦ θ (r) D for vectors θ (r) d ∈ Rpd for all r ∈ [R] and d ∈ [D], and ◦ denotes the outer product of vectors. That is, Θ is a low-dimensional, lowrank tensor. This is motivated by the fact that the number of parameters in Θ is only R ·Dd=1pd , which is significantly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016